Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate serialize_v0 to new API (as part of serialization layer) #190

Merged
merged 5 commits into from
Jun 5, 2024

Conversation

mihaimaruseac
Copy link
Collaborator

@mihaimaruseac mihaimaruseac commented May 28, 2024

Summary

Migrate serialize_v0 to new API.

This is the middle layer of the API design work (#172). We add a manifest abstract class to represent various manifests (#111 #112) and also ways to serialize a model directory into manifests and ways to verify the manifests.

For now, this only does what was formerly known as serialize_v0. The v1 and the manifest versions will come soon.

Note: This has a lot of inspiration from #112, but makes the API work with all the usecases we need to consider right now.

Release Note

NONE

Documentation

NONE

@mihaimaruseac mihaimaruseac force-pushed the api-manifest branch 2 times, most recently from 3da46a9 to a7d78ea Compare May 29, 2024 13:45
@mihaimaruseac mihaimaruseac mentioned this pull request May 30, 2024
mihaimaruseac added a commit to mihaimaruseac/model-transparency that referenced this pull request Jun 3, 2024
Missed this in sigstore#188, but found out I need it when working on sigstore#190. The
`serialize_v0`/`serialize_v1` methods all had headers in front of the
files, so we need to do that too. Will update usage of header on sigstore#190
shortly.

As a benefit, we can simulate hashing a file with a header for the first
portion of the file and a sharded hasher for the remainder of the file.

Signed-off-by: Mihai Maruseac <[email protected]>
@mihaimaruseac mihaimaruseac force-pushed the api-manifest branch 3 times, most recently from ac1f654 to 4cae6b3 Compare June 4, 2024 00:49
This is the middle layer of the API design work (sigstore#172). We add a manifest abstract class to represent various manifests (sigstore#111 sigstore#112) and also ways to serialize a model directory into manifests and ways to verify the manifests.

For now, this only does what was formerly known as `serialize_v0`. The v1 and the manifest versions will come soon.

Note: This has a lot of inspiration from sigstore#112, but makes the API work with all the usecases we need to consider right now.

Signed-off-by: Mihai Maruseac <[email protected]>
@mihaimaruseac mihaimaruseac changed the title [WIP] Add API for manifest and serialization Migrate serialize_v0 to new API (as part of serialization layer) Jun 4, 2024
@mihaimaruseac mihaimaruseac added this to the V1 release milestone Jun 4, 2024
@mihaimaruseac
Copy link
Collaborator Author

Opening this one for review now while I'm working on v1 and manifest (v2)

@mihaimaruseac mihaimaruseac marked this pull request as ready for review June 4, 2024 00:57
@mihaimaruseac mihaimaruseac requested a review from a team as a code owner June 4, 2024 00:57
Signed-off-by: Mihai Maruseac <[email protected]>
model_signing/manifest/manifest.py Show resolved Hide resolved
model_signing/serializing/dfs.py Show resolved Hide resolved
model_signing/serializing/dfs.py Outdated Show resolved Hide resolved
model_signing/serializing/dfs.py Show resolved Hide resolved
model_signing/serializing/dfs.py Show resolved Hide resolved
model_signing/serializing/dfs.py Show resolved Hide resolved

assert manifest == new_manifest

def test_file_model_hash_changes_if_content_changes(
Copy link
Collaborator

@laurentsimon laurentsimon Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be a good idea to be more thorough how we update models: iterate over each character and each file name to make sure we have everything covered. Too easy to make mistakes :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It will slow down testing and the only thing it can catch is if any hashing engine has collisions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may uncover bugs for more complicated serializations, eg with sharding and different parameters. Here you're right it's simple enough that we're probably OK. Note that we could add just a couple test function and small models to minimize performance during testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it for end-to-end testing (#5), added a comment there to refer to this

model_signing/serializing/dfs_test.py Show resolved Hide resolved
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
@mihaimaruseac
Copy link
Collaborator Author

Currently blocked by DCO (dcoapp/app#211). Will make new PRs on top of this one and rebase when this gets merged.

@mihaimaruseac mihaimaruseac merged commit e9071f1 into sigstore:main Jun 5, 2024
18 checks passed
@mihaimaruseac mihaimaruseac deleted the api-manifest branch June 5, 2024 16:59


@dataclass
class DigestManifest(Manifest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a model look like as a manifest?
As the digest manifest is defined now, it contains a single digest. Shouldn't it contain a list of digests to represent the model?

Where would we place the signature in this manifest?

Perhaps we could align a bit more with the sigstore bundle type and least provide a model manifest that contains the digests, the signature and verification material.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on that now, hope to have a PR later today/early tomorrow. This PR and the one after it only migrated the old serialize_v0/serialize_v1 versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants